{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "welsh-campaign",
   "metadata": {},
   "source": [
    "# Cholesky 分解中下三角矩阵的导数"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "general-marina",
   "metadata": {},
   "source": [
    "> 创建时间：2021-03-17"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "recorded-collect",
   "metadata": {},
   "source": [
    "这份文档简单地学习 Cholesky 分解中下三角矩阵的导数。这是在实现 RI-SCF/MP2 的解析导数时所遇到的问题。\n",
    "\n",
    "这篇文档的学习与参考文献与复现对象是 Murray [^Murray-Murray.arXiv.2016]。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "domestic-congo",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "mexican-cleveland",
   "metadata": {},
   "source": [
    "## Cholesky 分解回顾"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "completed-dutch",
   "metadata": {},
   "source": [
    "对于任意实对称正定矩阵 $\\mathbf{S} \\in \\mathbb{R}^{n \\times n}$，其 Cholesky 分解可以通过下式给出：\n",
    "\n",
    "$$\n",
    "\\mathbf{S} = \\mathbf{L} \\mathbf{L}^\\dagger \\; \\text{or} \\; S_{PQ} = L_{PR} L_{QR}\n",
    "$$\n",
    "\n",
    "其中，$\\mathbf{L}$ 是下三角矩阵。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ready-overview",
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 10\n",
    "S = np.cov(np.random.randn(n, 2 * n))\n",
    "L = np.linalg.cholesky(S)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "amended-visiting",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.allclose(L @ L.T, S)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "soviet-starter",
   "metadata": {},
   "source": [
    "我们不讨论 Cholesky 分解的实现方式，只需要知道结论即可。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aggressive-rates",
   "metadata": {},
   "source": [
    "## Cholesky 分解矩阵 $\\mathbf{L}$ 的数值导数"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "great-possible",
   "metadata": {},
   "source": [
    "现在假定 $\\mathbf{S}$ 是关于外部参量 $x$ 的函数矩阵，并且 $\\partial_x \\mathbf{S}$ 是已知且对称的。在这种情况下，我们希望求取 $\\partial_x \\mathbf{L}$。我们令 $\\partial_x \\mathbf{S}$ 的变量名是 `dS`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "extra-parade",
   "metadata": {},
   "outputs": [],
   "source": [
    "dS = np.cov(np.random.randn(n, n))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "agricultural-vacation",
   "metadata": {},
   "source": [
    "数值导数很容易地通过数值查分方法给出：当数值导数间隔 $h$ 很小时 (譬如 1e-6)，那么下述近似关系成立：\n",
    "\n",
    "$$\n",
    "(\\mathbf{L} + h \\partial_x \\mathbf{L}) (\\mathbf{L} + h \\partial_x \\mathbf{L})^\\dagger \\simeq \\mathbf{S} + h \\partial_x \\mathbf{S}\n",
    "$$\n",
    "\n",
    "我们令通过上述方法求得的 $\\partial_x \\mathbf{L}$ 的变量名是 `ndL`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cubic-transparency",
   "metadata": {},
   "outputs": [],
   "source": [
    "h = 1e-7\n",
    "ndL = (np.linalg.cholesky(S + h * dS) - L) / h"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "russian-journey",
   "metadata": {},
   "source": [
    "## Cholesky 分解矩阵 $\\mathbf{L}$ 的解析导数"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "documented-productivity",
   "metadata": {},
   "source": [
    "通过链式法则，可以知道\n",
    "\n",
    "$$\n",
    "\\partial_x \\mathbf{S} = \\partial_x \\mathbf{L} \\mathbf{L}^\\dagger + \\mathbf{L} \\partial_x \\mathbf{L}^\\dagger\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "checked-casting",
   "metadata": {},
   "source": [
    "对等式两边同时左乘 $\\mathbf{L}^{-1}$ 并右乘 $\\mathbf{L}^{-\\dagger}$，得到\n",
    "\n",
    "$$\n",
    "\\mathbf{L}^{-1} \\partial_x \\mathbf{S} \\mathbf{L}^{-\\dagger} = \\mathbf{L}^{-1} \\partial_x \\mathbf{L} + \\partial_x \\mathbf{L}^\\dagger \\mathbf{L}^{-\\dagger}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "approved-burke",
   "metadata": {},
   "source": [
    "我们能知道 $\\mathbf{L}^{-1}$ 与 $\\partial_x \\mathbf{L}$ 都是下三角矩阵，因此它们的乘积也是下三角矩阵。同理，$\\partial_x \\mathbf{L}^\\dagger \\mathbf{L}^{-\\dagger}$ 是上三角矩阵。这两个矩阵相互呈转置关系，因此对角线上的值是相等的。\n",
    "\n",
    "因此，我们构造下述作用关系 (或者等价地，矩阵)\n",
    "\n",
    "$$\n",
    "\\Phi_{ij} = \\left\\{\n",
    "\\begin{aligned}\n",
    "& 1 && i < j \\\\\n",
    "& 1/2 && i = j \\\\\n",
    "& 0 && i > j\n",
    "\\end{aligned}\n",
    "\\right.\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "waiting-combine",
   "metadata": {},
   "outputs": [],
   "source": [
    "F = np.zeros((n, n))\n",
    "for i in range(n):\n",
    "    F[i, :i] = 1\n",
    "    F[i, i] = 1/2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "wrapped-surfing",
   "metadata": {},
   "source": [
    "那么利用上下三角的对称性，下式的下三角部分成立：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{\\Phi} \\odot (\\mathbf{L}^{-1} \\partial_x \\mathbf{S} \\mathbf{L}^{-\\dagger}) = \\mathbf{L}^{-1} \\partial_x \\mathbf{L}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "lesser-eligibility",
   "metadata": {},
   "source": [
    "对上式左乘 $\\mathbf{L}$，立即得到\n",
    "\n",
    "$$\n",
    "\\partial_x \\mathbf{L} = \\mathbf{L} \\boldsymbol{\\Phi} \\odot (\\mathbf{L}^{-1} \\partial_x \\mathbf{S} \\mathbf{L}^{-\\dagger})\n",
    "$$\n",
    "\n",
    "为了程序书写方便，额外定义 `Linv` $\\mathbf{L}^{-1}$。注意到点乘 $\\odot$ 的运算优先级比矩阵乘法高，但在 numpy 中点乘与矩阵乘法的运算优先级相同，因此要多加一层括号。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "nominated-trauma",
   "metadata": {},
   "outputs": [],
   "source": [
    "Linv = np.linalg.inv(L)\n",
    "dL = L @ (F * (Linv @ dS @ Linv.T))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "lesser-approach",
   "metadata": {},
   "source": [
    "在适当的阈值下，数值与解析导数的误差相近。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "molecular-internship",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.allclose(dL, ndL, rtol=1e-5, atol=1e-6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "alleged-auction",
   "metadata": {},
   "source": [
    "## $\\mathbf{L}$ 的解析导数的快速实现"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "introductory-sullivan",
   "metadata": {},
   "source": [
    "由于矩阵求逆是 $O(n^3)$ 运算量，计算耗时相当大；因此较为廉价的方法是利用求解线性问题，避免直接求逆。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "civil-squad",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.linalg import solve_triangular\n",
    "from functools import partial\n",
    "st = partial(solve_triangular, lower=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "organizational-buffalo",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.allclose(L @ (F * st(L, st(L, dS.T).T)), dL, rtol=1e-5, atol=1e-6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "recent-valuable",
   "metadata": {},
   "source": [
    "我们现在考虑较大的矩阵 (1000 维度)："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "worthy-fraud",
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 1000\n",
    "S = np.cov(np.random.randn(n, 2 * n))\n",
    "L = np.linalg.cholesky(S)\n",
    "dS = np.cov(np.random.randn(n, n))\n",
    "F = np.zeros((n, n))\n",
    "for i in range(n):\n",
    "    F[i, :i] = 1\n",
    "    F[i, i] = 1/2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "monetary-pledge",
   "metadata": {},
   "source": [
    "其计算耗时可以估计如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "incorporate-thinking",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "67.6 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 10\n",
    "# with inverse\n",
    "Linv = np.linalg.inv(L)\n",
    "dL = L @ (F * (Linv @ dS @ Linv.T))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "upper-session",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "36.5 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 10\n",
    "# without inverse\n",
    "dL = L @ (F * st(L, st(L, dS.T).T))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "employed-balance",
   "metadata": {},
   "source": [
    "[^Murray-Murray.arXiv.2016]: Murray, I. Differentiation of the Cholesky Decomposition, *arXiv*: [1602.07527](https://arxiv.org/abs/1602.07527), 2016."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
